Task Reallocation for Maximal Reliability in Distributed Computing Systems with Uncertain Topologies and Non-Markovian Delays
نویسندگان
چکیده
The ability to model and optimize reliability is central in designing survivable distributed computing systems (DCSs) where servers are prone to fail permanently. In this paper the service reliability of a DCS in uncertain topologies is analytically characterized by using a novel regeneration-based probabilistic analysis. The analysis takes into account the stochastic failure times of servers, the heterogeneity and randomness of both service times and communication delays, as well as arbitrary task-reallocation policies. Auxiliary age variables are introduced in the analysis to capture the memory associated with the non-Markovian (non-exponential) communication and service random times, thereby enabling the recursive analytical characterization of reliability. Implications of the non-exponential times on reliability are studied, and the results are compared to those obtained using a Markovian formulation; in particular, the effect of increasing the mean communication times is investigated. The model is further used to solve the optimization problem of task-reallocation for maximal reliability, and the results are compared to those from Monte-Carlo simulations and actual experiments conducted on a small-scale DCS over the Internet.
منابع مشابه
A Framework for Task Reallocation in Distributed Computing Systems with Non-Markovian Communication Delays
This paper presents a general framework for optimal task reallocation in heterogeneous distributed computing systems (DCSs). The framework relies on a rigorous analytical model for the stochastic execution time of a workload. The model takes into account the heterogeneity and randomness of both service times and communication delays, an arbitrary task-reallocation policy as well as the stochast...
متن کاملNew Robust Stability Criteria for Uncertain Neutral Time-Delay Systems With Discrete and Distributed Delays
In this study, delay-dependent robust stability problem is investigated for uncertain neutral systems with discrete and distributed delays. By constructing an augmented Lyapunov-Krasovskii functional involving triple integral terms and taking into account the relationships between the different delays, new less conservative stability and robust stability criteria are established first using the...
متن کاملGreen Energy-aware task scheduling using the DVFS technique in Cloud Computing
Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...
متن کاملDisTriB: Distributed Trust Management Model Based on Gossip Learning and Bayesian Networks in Collaborative Computing Systems
The interactions among peers in Peer-to-Peer systems as a distributed collaborative system are based on asynchronous and unreliable communications. Trust is an essential and facilitating component in these interactions specially in such uncertain environments. Various attacks are possible due to large-scale nature and openness of these systems that affects the trust. Peers has not enough inform...
متن کاملSynchronization criteria for T-S fuzzy singular complex dynamical networks with Markovian jumping parameters and mixed time-varying delays using pinning control
In this paper, we are discuss about the issue of synchronization for singular complex dynamical networks with Markovian jumping parameters and additive time-varying delays through pinning control by Takagi-Sugeno (T-S) fuzzy theory.The complex dynamical systems consist of m nodes and the systems switch from one mode to another, a Markovian chain with glorious transition probabili...
متن کامل